[step 1]refine code to support all devices in torch and hot fix for gemma4-unified by wenhuach21 · Pull Request #1879 · intel/auto-round

wenhuach21 · 2026-06-01T08:39:05Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Bug fix

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.
The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.
优化manager下面的那些函数

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR refactors device handling by introducing a unified DeviceManager abstraction and updating existing utilities to use it, with the intent of reducing scattered backend-specific (cuda/xpu/hpu) branching across the codebase.

Changes:

Added auto_round/utils/device_manager.py to centralize backend discovery and runtime ops (sync/cache/memory queries).
Updated auto_round/utils/device.py to route device counting, selection, memory clearing, and memory queries through the new device manager APIs.
Updated auto_round/auto_scheme/delta_loss.py to synchronize via the active device manager and broaden “non-CPU device” checks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
auto_round/utils/device.py	Replaces backend-specific device/memory logic with `DeviceManager` calls.
auto_round/utils/device_manager.py	New unified device backend abstraction (discovery + runtime + memory APIs).
auto_round/auto_scheme/delta_loss.py	Uses `DeviceManager` for synchronization and generalized non-CPU device checks.

chensuyue · 2026-06-01T09:25:00Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-01T09:25:10Z

Azure Pipelines successfully started running 1 pipeline(s).

…into refine_device_1 # Conflicts: # auto_round/utils/device_manager.py

for more information, see https://pre-commit.ci

chensuyue · 2026-06-02T10:09:44Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-02T10:09:54Z

Azure Pipelines successfully started running 1 pipeline(s).

for more information, see https://pre-commit.ci

…into refine_device_1

for more information, see https://pre-commit.ci

…into refine_device_1

for more information, see https://pre-commit.ci

…into refine_device_1

for more information, see https://pre-commit.ci

…into refine_device_1 # Conflicts: # auto_round/utils/device_manager.py

for more information, see https://pre-commit.ci

…into refine_device_1 Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com>

for more information, see https://pre-commit.ci

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 8 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

for more information, see https://pre-commit.ci

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

wenhuach21 · 2026-06-04T09:23:39Z

some functions have not been refined, I prefer to merge this pr first and refine them later to avoid conflicts. please have a review when you are free

chensuyue · 2026-06-04T10:04:19Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-04T10:04:30Z

Azure Pipelines successfully started running 1 pipeline(s).

…into refine_device_1

for more information, see https://pre-commit.ci

chensuyue · 2026-06-04T12:40:54Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-04T12:41:03Z

Azure Pipelines successfully started running 1 pipeline(s).

lvliang-intel · 2026-06-05T07:34:22Z

    return pipe, model
+
+
+_PRE_DEFINED_FIXED_ATTR = {"gemma4_unified": {"has_variable_block_shape": True}}


has_variable_block_shape = True causes every block to cache its own inputs,
which increases VRAM usage by approximately N× (N = total number of blocks).
Compared to the original Gemma4 approach that uses a per-layer forward
monkey-patch to dynamically rebuild position_embeddings at replay time
(zero extra cache), this is a significant memory trade-off.

chensuyue · 2026-06-05T08:00:20Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-05T08:00:59Z

Azure Pipelines successfully started running 1 pipeline(s).

refine devices

bbe1fb1

Copilot AI review requested due to automatic review settings June 1, 2026 08:39

Copilot started reviewing on behalf of wenhuach21 June 1, 2026 08:39 View session

[pre-commit.ci] auto fixes from pre-commit.com hooks

1aa5efd

for more information, see https://pre-commit.ci

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread auto_round/utils/device_manager.py Outdated

Comment thread auto_round/utils/device_manager.py

Comment thread auto_round/utils/device_manager.py

wenhuach21 and others added 6 commits June 2, 2026 16:28

refine devices

fc14c01

Merge branch 'refine_device_1' of https://github.com/intel/auto-round …

705bd30

…into refine_device_1 # Conflicts: # auto_round/utils/device_manager.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

9b19e74

for more information, see https://pre-commit.ci

refine devices

a9f4b45

refine devices

2e03b7c

[pre-commit.ci] auto fixes from pre-commit.com hooks

5775f29

for more information, see https://pre-commit.ci

wenhuach21 changed the title ~~refine devices~~ refine code to support all devices in torch Jun 2, 2026

wenhuach21 and others added 14 commits June 3, 2026 14:36

clean a little

a4141d3

[pre-commit.ci] auto fixes from pre-commit.com hooks

4db2409

for more information, see https://pre-commit.ci

update

2a32290

Merge branch 'refine_device_1' of https://github.com/intel/auto-round …

5026a90

…into refine_device_1

[pre-commit.ci] auto fixes from pre-commit.com hooks

9977980

for more information, see https://pre-commit.ci

fix ut

d5cd6aa

Merge branch 'refine_device_1' of https://github.com/intel/auto-round …

182168f

…into refine_device_1

[pre-commit.ci] auto fixes from pre-commit.com hooks

181e664

for more information, see https://pre-commit.ci

fix code scan

ab95d3d

Merge branch 'refine_device_1' of https://github.com/intel/auto-round …

8882122

…into refine_device_1

[pre-commit.ci] auto fixes from pre-commit.com hooks

a0bee00

for more information, see https://pre-commit.ci

fix some issues

905df02

Merge branch 'refine_device_1' of https://github.com/intel/auto-round …

cc73855

…into refine_device_1 # Conflicts: # auto_round/utils/device_manager.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

78c15bf

for more information, see https://pre-commit.ci

pre-commit-ci Bot and others added 4 commits June 4, 2026 08:18

[pre-commit.ci] auto fixes from pre-commit.com hooks

0aef05b

for more information, see https://pre-commit.ci

update

ff60860

Merge branch 'refine_device_1' of https://github.com/intel/auto-round …

8f563c5

…into refine_device_1 Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6efade3

for more information, see https://pre-commit.ci

wenhuach21 requested review from WeiweiZhang1, Copilot, n1ck-guo and yiliu30 June 4, 2026 09:03

Copilot started reviewing on behalf of wenhuach21 June 4, 2026 09:04 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

wenhuach21 and others added 5 commits June 4, 2026 17:19

Potential fix for pull request finding

542cf94

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

919594b

for more information, see https://pre-commit.ci

Potential fix for pull request finding

7dc8fc2

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

07e86ab

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Merge branch 'main' into refine_device_1

e351f98

wenhuach21 and others added 4 commits June 4, 2026 19:26

hot fix for gemma4-12b

cf17da9

Merge branch 'refine_device_1' of https://github.com/intel/auto-round …

7c3573d

…into refine_device_1

update

65ef503

[pre-commit.ci] auto fixes from pre-commit.com hooks

5f227f5

for more information, see https://pre-commit.ci

wenhuach21 changed the title ~~[step 1]refine code to support all devices in torch~~ [step 1]refine code to support all devices in torch and hot fix for gemma4-unified Jun 4, 2026

lvliang-intel reviewed Jun 4, 2026

View reviewed changes

Comment thread auto_round/utils/device_manager.py Outdated

tiny change

a99fbdf

lvliang-intel reviewed Jun 5, 2026

View reviewed changes

		return pipe, model


		_PRE_DEFINED_FIXED_ATTR = {"gemma4_unified": {"has_variable_block_shape": True}}

Conversation

wenhuach21 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chensuyue commented Jun 1, 2026

Uh oh!

azure-pipelines Bot commented Jun 1, 2026

Uh oh!

chensuyue commented Jun 2, 2026

Uh oh!

azure-pipelines Bot commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Jun 4, 2026

Uh oh!

chensuyue commented Jun 4, 2026

Uh oh!

azure-pipelines Bot commented Jun 4, 2026

Uh oh!

chensuyue commented Jun 4, 2026

Uh oh!

azure-pipelines Bot commented Jun 4, 2026

Uh oh!

Uh oh!

lvliang-intel Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

chensuyue commented Jun 5, 2026

Uh oh!

azure-pipelines Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wenhuach21 commented Jun 1, 2026 •

edited

Loading